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Scientific  Progress 


This  research  aims  to  develop  fundamental  theories  and  practical  protocols  for  tactical 
communication  networks  of  cognitive  radios.  It  focuses  on  three  key  areas  of  cognitive 
networking:  (i)  opportunity  sensing  and  cognition;  (ii)  opportunity  tracking  and  exploitation; 
and  (iii)  cognitive  networking.  In  the  following,  we  summarize  our  major  scientific  findings 
in  each  of  these  three  research  areas. 


1  Opportunity  Sensing  and  Cognition 

Under  this  research  topic,  our  focus  is  on  the  quickest  detection  of  spectrum  opportunities 
under  reliability  constraints.  This  work  is  the  first  that  exploits  the  presence  of  multiple 
channels  and  the  heavy-tail  distribution  of  the  connection  time  in  opportunity  detection.  It 
gives  a  fresh  twist  to  the  classic  problem  of  quickest  change  detection  in  a  single  stochastic 
process. 

We  investigate  quickest  detection  of  spectrum  opportunities  in  multiple  channels  where 
the  transmissions  of  primary  users  are  unslotted  and  asynchronous.  We  have  formulated  this 
problem  as  quickest  detection  of  idle/off  periods  in  multiple  on-off  processes.  We  show  that 
this  problem  presents  a  fresh  twist  to  the  classic  signal  processing  problem  of  quickest  change 
detection  that  considers  only  one  stochastic  process.  In  particular,  we  demonstrate  that  the 
key  to  quickest  change  detection  in  multiple  processes  is  to  abandon  the  current  process 
when  its  state  is  unlikely  to  change  in  the  near  future  (as  indicated  by  the  measurements 
obtained  so  far)  and  seek  opportunities  in  a  new  process.  Such  a  channel  switching  strategy 
is  especially  crucial  to  quickest  opportunity  detection  when  the  connection  time  (channel 
busy  duration)  of  the  primary  network  has  a  heavy  tail  distribution. 

In  [1-3],  we  have  established  a  Bayesian  formulation  of  quickest  detection  of  spectrum 
opportunities  in  multiple  unslotted  and  asynchronous  channels  within  a  decision-theoretic 
framework.  Based  on  this  Bayesian  formulation,  we  have  established  the  basic  structures  of 
the  optimal  detection  and  switching  rules  in  both  the  infinite  and  the  finite  regimes  in  terms 
of  the  number  of  on-off  processes. 

In  the  infinite  case,  we  consider  a  large  number  of  homogeneous  independent  on-off 
processes  and  the  user  always  switches  to  a  new  process  should  it  decide  to  abandon  the 
current  one.  We  formulate  the  problem  as  a  Partially  Observable  Markov  Decision  Process 
(POMDP).  While  POMDPs  are  PSAPCE-hard  in  general,  we  show  that  for  the  problem  at 
hand,  the  optimal  decision  rule  has  a  simple  threshold  structure  when  the  busy  and  idle  times 
of  the  on-off  processes  obey  (potentially  different)  geometric/exponential  distributions.  The 
threshold  structure  is  with  respect  to  the  posterior  probability  that  the  process  currently 
being  observed  is  idle  at  the  current  time  (given  the  entire  observation  history). 

In  the  finite  case,  we  address  quickest  detection  with  memory:  switching  back  to  a  previ¬ 
ously  visited  process  is  allowed,  and  measurements  obtained  during  previous  visits  are  taken 
into  account  in  decision  making.  We  show  that  this  freedom  of  switching  with  memory  sig¬ 
nificantly  complicates  the  problem.  The  resulting  POMDP  changes  from  a  one-dimensional 
problem  to  an  TV-dimensional  problem,  where  TV  is  the  number  of  on-off  processes.  Our 
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objective  is  to  establish  the  basic  structure  of  the  optimal  decision  rule  and  develop  low- 
complexity  policies  with  strong  performance.  In  particular,  we  show  that  the  optimal  action 
of  declaring  always  occurs  in  the  process  with  the  largest  posterior  probability  of  being  in 
the  idle  state.  The  monotonicity  of  the  detection  threshold  is  also  established.  Based  on 
the  basic  structure  of  the  optimal  policy,  we  propose  a  low-complexity  threshold  policy. 
Specifically,  under  the  proposed  policy,  the  user  always  observes  the  process  with  the  largest 
posterior  probability  of  being  idle  and  declares  when  the  largest  posterior  probability  exceeds 
the  detection  threshold.  The  near  optimal  performance  of  this  threshold  policy  is  demon¬ 
strated  by  a  comparison  with  a  full-sensing  scheme  which  defines  an  upper  bound  on  the 
optimal  performance.  Furthermore,  we  show  that  this  low-complexity  policy  converges  to 
the  optimal  policy  for  the  infinite  case  as  the  number  N  of  processes  increases. 

Furthermore,  we  studied  quickest  detection  under  arbitrarily  distributed  busy  and  idle 
times,  in  particular,  heavy-tail  distributions.  For  heavy-tailed  busy  time,  we  show  that  the 
persistency  property  of  heavy-tail  distributions  makes  it  particularly  important  to  adopt  a 
switching  strategy  (rather  than  waiting  faithfully  in  one  process)  to  avoid  realizations  of 
exceptionally  long  busy  periods. 

2  Opportunity  Tracking  and  Exploitation 

2.1  Robust  Opportunity  Tracking  under  Imperfect  Sensing 

We  have  investigated  opportunity  tracking  and  exploitation  in  multiple  channels  when  spec¬ 
trum  sensing  is  subject  to  error.  We  have  formulated  the  problem  as  a  restless  multi-armed 
bandit  process,  which  is,  in  general,  PSAPCE-hard.  Surprisingly,  we  have  shown  in  [4-7] 
that  for  this  class  of  restless  bandit  process  most  relevant  to  cognitive  radio  systems,  simple 
structural  policies  exist  that  achieve  a  strong  performance  with  low  complexity.  Specifically, 
we  show  that  the  myopic  policy,  which  maximizes  the  expected  immediate  reward  while 
ignoring  the  impact  of  the  current  action  on  the  future,  has  a  simple  structure  when  the 
false  alarm  probability  of  the  channel  state  detector  is  below  a  certain  value.  This  structure 
is  semi-universal:  it  is  independent  of  the  Markovian  transition  probabilities  that  govern 
the  stochastic  behavior  of  the  primary  users.  The  myopic  policy  can  thus  be  implemented 
with  minimal  prior  knowledge  on  the  primary  system,  and  it  automatically  tracks  model 
variations  of  the  primary  system.  Furthermore,  we  show  that  with  such  a  simple  and  ro¬ 
bust  structure,  the  myopic  policy  achieves  the  optimal  performance  for  the  two-channel  case. 
Numerical  examples  suggest  its  optimality  for  the  general  case. 

To  analytically  characterize  the  performance  of  the  myopic  policy  in  the  general  case, 
we  have  developed  closed-form  lower  and  upper  bounds  on  the  steady-state  throughput 
achieved  by  the  myopic  policy.  The  lower  bound  monotonically  approaches  to  the  upper 
bound  as  the  number  of  channels  increases.  This  result  thus  defines  the  limiting  performance 
of  the  myopic  policy  as  the  number  of  channels  approaches  to  infinity.  Furthermore,  by 
analyzing  a  genie-aided  system  which  provides  an  upper  bound  on  the  optimal  performance, 
we  have  characterized  the  approximation  factor  of  the  myopic  policy  to  bound  the  worst-case 
performance  loss  of  the  myopic  policy  with  respect  to  the  optimal  policy. 
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2.2  Opportunistic  Spectrum  Access  in  Self  Similar  Primary  Traffic 

We  have  also  investigated  opportunity  tracking  and  exploitation  in  self-similar  primary  traffic 
with  long  range  dependency. 

In  [8],  we  have  investigated  MAC  protocols  for  tracking  spectrum  opportunities  in  self¬ 
similar  primary  traffic  over  multiple  channels.  We  adopt  a  multiple  time  scale  hierarchical 
Markovian  model  of  self-similar  traffic  and  develop  a  decision-theoretic  framework  based 
on  the  theory  of  POMDP  for  opportunity  tracking  and  exploitation  in  self-similar  primary 
traffic.  Unfortunately,  solving  a  general  POMDP  is  often  intractable  due  to  the  exponential 
complexity.  A  simple  approach  is  to  implement  the  myopic  policy,  which  only  focuses  on 
maximizing  the  immediate  reward  and  ignores  the  impact  of  current  action  on  the  future 
reward.  We  have  shown  in  [8]  that  the  myopic  policy  has  a  simple  and  robust  structure 
under  certain  conditions.  This  simple  structure  obviates  the  need  to  know  the  transition 
probabilities  of  the  underlying  multiple  time  scale  Markovian  model  and  allows  automatic 
tracking  of  variations  in  the  primary  traffic  model.  Compared  to  Markovian  channel  models, 
the  model  at  hand  is  more  general  but  requires  more  parameters,  it  is  thus  more  impor¬ 
tant  to  have  policies  that  are  robust  to  model  mismatch  and  parameter  variations.  The 
strong  performance  of  the  myopic  policy  with  such  a  simple  and  robust  structure  has  been 
demonstrated  through  extensive  simulation  examples. 

2.3  Opportunistic  Spectrum  Access  in  Asynchronous  Unslotted 
Primary  Systems 

We  have  also  investigated  Opportunistic  Spectrum  Access  (OSA)  in  unslotted  primary  sys¬ 
tems.  The  occupancy  of  each  channel  by  primary  users  is  modeled  as  a  continuous-time 
Markov  chain,  which  has  been  shown  to  match  well  with  the  spectrum  usage  in  wireless 
LAN.  The  secondary  network  adopts  a  slotted  transmission  structure.  At  the  beginning  of 
each  slot,  a  secondary  user  decides  which  channel  to  sense  and  potentially  transmit  over.  The 
problem  appears  to  be  significantly  more  complex  than  its  counterpart  in  slotted  primary 
systems  clue  to  the  arbitrary  starting  and  ending  times  of  the  primary  transmissions  and  the 
half  duplex  mode  of  the  secondary  user  that  prevents  it  from  sensing  the  channel  during  a 
transmission. 

In  [9],  we  have  established  a  certain  equivalence  between  OSA  in  unslotted  primary 
systems  and  that  in  slotted  primary  systems.  This  equivalence  points  to  the  possibility  of 
reducing  the  design  of  OSA  in  unslotted  primary  systems  to  that  in  slotted  primary  systems, 
a  significantly  simpler  problem.  Specifically,  it  is  shown  that  even  though  the  underlying 
primary  systems  are  modeled  as  continuous-time  Markov  chains,  the  joint  design  of  OSA  fits 
into  the  discrete-time  constrained  POMDP  framework  that  we  developed  in  our  prior  work  for 
the  slotted  case.  This  result  is  based  on  the  following  two  key  observations:  (i)  Opportunity 
detection  should  be  formulated  as  detecting  the  channel  state  during  the  transmission  period 
of  a  secondary  user’s  slot  based  on  the  measurements  taken  in  the  sensing  period  of  the  slot; 
(ii)  under  this  formulation  of  opportunity  detection,  the  difference  between  unslotted  and 
slotted  primary  systems  —  that  transmissions  of  primary  users  can  start  and  end  at  arbitrary 
time  instants  —  simply  contributes  to  sensing  errors. 

In  [9],  we  further  show  that  the  separation  principle  that  we  established  in  our  prior 
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work  for  OSA  in  slotted  primary  systems  is  preserved  for  the  unslotted  case  under  certain 
conditions.  This  result  does  not  follow  directly  from  the  separation  principle  for  the  slotted 
case.  The  main  difficulty  here  is  that  the  operating  characteristics  (probabilities  of  false 
alarm  and  miss  detection)  of  the  optimal  spectrum  sensor  is  time  varying  and  dependent 
on  the  observation  and  decision  history.  This  significantly  enriches  the  design  space  and 
complicates  the  analysis  of  the  optimal  solution.  In  [9],  we  show  that  when  the  variation 
of  the  false  alarm  probability  with  respect  to  the  observation  and  decision  history  satisfies 
certain  conditions,  the  separation  principle  is  preserved;  the  same  simple,  robust,  and  optimal 
design  of  OSA  can  be  achieved  in  unslotted  primary  systems. 

We  have  also  investigated  the  extension  of  our  previous  result  on  spectrum  opportunity 
tracking  to  non-Markovian  channel  occupancy  models  [10,11].  In  particular,  we  consider 
two  simple  round-robin  tracking  policies  for  dynamic  multi-channel  access  in  cognitive  radio 
networks  one  in  which  channel  switching  takes  place  when  the  primary  user  is  sensed  to 
be  present,  and  one  in  which  a  channel  switching  takes  place  when  the  primary  user  is 
sensed  to  be  absent.  Our  prior  work  has  shown  that  these  policies  are  each  optimal  under 
certain  conditions  when  the  primary  user  occupancy  on  each  channel  can  be  described  as 
an  independent  two-state  Markov  chain.  In  [10],  we  consider  a  very  general  case  where 
the  primary  user  occupancy  on  each  channel  is  an  arbitrary  stationary  and  ergodic  two- 
state  process,  and  derive  bounds  on  their  performance.  The  bounds  provide  insights  into 
conditions  under  which  these  extremely  simple  policies  perform  well. 

3  Cognitive  Networking 

Under  this  research  topic,  we  focus  on  cognitive  networking  under  unknown  and  dynamic 
models  of  the  coexisting  primary  systems.  Our  technical  approaches  rest  on  continuum 
percolation  and  stochastic  online  learning  and  decision  theory. 

3.1  Connectivity  and  Multi-Hop  Delay  of  Cognitive  Radio  Net¬ 
works 

We  have  addressed  in  [12, 13]  the  connectivity  of  large-scale  ad  hoc  heterogeneous  wireless 
networks,  where  secondary  users  exploit  channels  temporarily  unused  by  primary  users  and 
the  existence  of  a  communication  link  between  two  secondary  users  depends  not  only  on 
the  distance  between  them  but  also  on  the  transmitting  and  receiving  activities  of  nearby 
primary  users.  We  have  introduced  the  concept  of  connectivity  region  defined  as  the  set  of 
density  pairs  -  the  density  of  secondary  users  and  that  of  primary  transmitters  -  under  which 
the  secondary  network  is  connected.  Using  theories  and  techniques  from  continuum  perco¬ 
lation,  we  have  analytically  characterized  the  connectivity  region  and  revealed  the  tradeoff 
between  proximity  (the  number  of  neighbors)  and  the  occurrence  of  spectrum  opportunity. 
Specifically,  we  have  shown  three  basic  properties  of  the  connectivity  region  -  contiguousness, 
monotonicity  of  the  boundary,  and  uniqueness  of  infinite  connected  components,  where  the 
uniqueness  implies  the  occurrence  of  a  phase  transition  phenomenon  in  terms  of  the  almost 
sure  existence  of  either  zero  or  one  infinite  connected  component;  we  have  identified  and 
analyzed  two  critical  densities  which  jointly  specify  the  profile  as  well  as  an  outer  bound 
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of  the  connectivity  region;  we  have  studied  the  impacts  of  secondary  users’  transmission 
power  on  the  connectivity  region  and  the  conditional  average  degree  of  a  secondary  user, 
and  designed  the  transmission  power  of  secondary  users  to  maximize  the  tolerance  of  the 
primary  traffic  load.  Furthermore,  we  have  established  a  necessary  and  a  sufficient  condition 
for  connectivity.  The  necessary  condition  which  depends  on  the  conditional  average  degree 
of  a  secondary  user  gives  another  outer  bound  of  the  connectivity  region,  while  the  sufficient 
condition  leads  to  an  inner  bound  of  the  connectivity  region. 

We  further  study  the  impact  of  temporal  dynamics  of  the  primary  traffic  on  the  connec¬ 
tivity  and  delay  scaling  of  the  secondary  networks.  In  [14],  we  consider  a  Poisson  distributed 
secondary  network  overlaid  with  a  Poisson  distributed  primary  network  in  an  infinite  two- 
dimensional  Euclidean  spacel.  The  existence  of  a  communication  link  between  two  secondary 
users  depends  on  not  only  their  separation  but  also  the  occurrence  of  the  spectrum  oppor¬ 
tunity  determined  by  the  transmitting  and  receiving  activities  of  nearby  primary  users.  We 
define  connectivity  via  the  finiteness  of  the  minimum  multihop  delay  (MMD)  between  two 
randomly  chosen  secondary  users.  Using  theories  and  techniques  from  continuum  percola¬ 
tion  and  ergodicity,  we  analytically  characterize  the  connectivity  of  the  secondary  network 
defined  in  terms  of  the  almost  sure  finiteness  of  the  multihop  delay,  and  show  the  occurrence 
of  a  phase  transition  phenomenon  while  studying  the  impact  of  the  temporal  dynamics  of 
the  primary  traffic  on  the  connectivity  of  the  secondary  network.  Specifically,  as  long  as 
the  primary  traffic  has  some  temporal  dynamics  caused  by  either  mobility  and/or  changes 
in  traffic  load  and  pattern,  the  connectivity  of  the  secondary  network  depends  solely  on 
its  own  density  and  is  independent  of  the  primary  traffic;  otherwise  the  connectivity  of  the 
secondary  network  requires  putting  a  density-dependent  cap  on  the  primary  traffic  load.  We 
show  that  the  scaling  behavior  of  the  multihop  delay  depends  critically  on  whether  or  not 
the  secondary  network  is  instantaneously  connected.  In  particular,  we  establish  the  scaling 
law  of  the  minimum  multihop  delay  with  respect  to  the  source-destination  distance  when 
the  propagation  delay  is  negligible. 

3.2  Online  Learning  for  Distributed  Spectrum  Sharing  under  Un¬ 
known  Models 

In  a  distributed  secondary  network  without  a  central  controller  or  a  dedicated  control  chan¬ 
nel,  each  secondary  user  needs  to  balance  choosing  the  most  promising  channel  with  avoiding 
competing  users  without  knowing  others’  actions  and  without  assuming  any  prior  knowledge 
about  the  primary  channel  occupancy.  We  have  mathematically  formulated  this  problem 
as  a  decentralized  multiarmed  bandit  process  (MAB)  [15],  which  is  a  generalization  of  the 
classic  MAB  that  considers  a  single  user  studied  in  the  seminar  paper  by  Lai  and  Robbins  in 
1985.  Specifically,  we  showed  in  [15]  that  the  minimum  regret  (where  the  regret  is  defined 
as  the  total  performance  loss  with  respect  to  the  ideal  case  with  known  model  and  perfect 
centralized  scheduling)  in  the  decentralized  MAB  grows  at  the  same  logarithmic  order  as  in 
the  centralized  counterpart  considered  by  Lai  and  Robbins.  We  also  developed  a  Time  Di¬ 
vision  Fair  Sharing  (TDFS)  framework  for  constructing  order-optimal  and  fair  decentralized 
policies. 

We  then  further  extended  this  result  in  three  directions.  First,  the  result  in  [15]  is 
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obtained  under  the  assumption  that  the  primary  traffic  is  i.i.d.  over  time.  We  have  relaxed 
this  assumption  by  considering  a  more  general  Markov  model  (with  unknown  transition 
probabilities)  of  the  primary  traffic.  Second,  the  result  in  [15]  assumes  perfect  sensing  at 
the  secondary  users.  We  have  extended  the  result  to  handle  sensing  errors.  Third,  the  result 
in  [15]  assumes  a  slotted  primary  system. 

Specifically,  in  [16-18],  the  occupancy  of  each  channel  is  modeled  as  a  Markov  chain 
with  unknown  transition  probabilities.  Multiple  distributed  secondary  users  aim  to  learn 
the  primary  traffic  model  and  exploit  the  idle  slots  for  transmission.  The  objective  of  the 
secondary  user  is  to  maximize  the  long-term  throughput  by  designing  an  optimal  channel 
selection  policy  without  knowing  the  traffic  dynamics  of  the  primary  users  and  without 
centralized  scheduling  among  secondary  users.  We  show  in  [16-18]  that  the  problem  leads  to 
a  restless  multi-armed  bandit  with  unknown  dynamics,  a  significant  variation  of  the  multi¬ 
armed  bandit  problems  that  has  not  been  studied  in  the  literature.  We  have  constructed 
a  channel  sensing  and  access  policy  that  achieves  a  regret  with  logarithmic  order  when  an 
arbitrary  nontrivial  bound  on  certain  system  parameters  is  known.  When  no  knowledge 
about  the  system  is  available,  we  extend  the  policy  to  achieve  a  regret  arbitrarily  close  to 
the  logarithmic  order.  In  both  cases,  the  throughput  of  the  secondary  network  achieves  the 
maximum  value  defined  by  the  ideal  scenario  where  the  secondary  network  with  N  users 
knows  which  N  channels  are  the  best  and  always  access  these  N  channels  through  a  perfect 
centralized  scheduling  that  eliminates  collisions. 

In  [19,20],  we  address  the  issue  of  sensing  errors  and  their  effect  on  the  learning  ability 
of  the  secondary  users.  We  show  in  [19,20]  that  with  multiple  distributed  secondary  users, 
imperfect  sensing  significantly  complicates  the  problem.  The  main  difficulty  is  that  each 
secondary  user  cannot  distinguish  between  secondary  collisions  caused  by  competition  and 
primary  collisions  caused  by  sensing  errors.  A  failed  transmission  due  to  secondary  collisions 
does  not  reflect  the  channel  quality.  If  a  secondary  user  learns  the  channel  quality  from 
the  history  of  successful  transmissions,  the  best  channels  may  not  be  correctly  identified. 
In  other  words,  collision  among  secondary  users  affects  not  only  the  immediate  reward  but 
also  the  learning  ability  at  each  colliding  user,  which  further  degrades  the  system  long¬ 
term  throughput.  In  [19,20],  we  formulate  the  multi-user  DSA  with  imperfect  sensing  as 
a  variant  of  decentralized  MAB  with  multiple  players  to  take  into  account  the  imperfect 
reward  observation.  We  show  that  the  optimal  system  regret  has  the  same  logarithmic  order 
as  in  the  case  with  perfect  sensing.  A  decentralized  SLCD  (Synchronized  Learning  under 
Corrupted  Data)  policy  is  proposed  to  achieve  the  logarithmic  order  of  the  system  regret. 
Under  this  policy,  the  network  throughput  converges  to  the  same  maximum  throughput  as 
in  the  ideal  case  with  known  model,  centralized  scheduling,  and  perfect  sensing. 

In  [21],  we  address  multi-channel  opportunistic  spectrum  access  in  unslotted  primary 
systems  under  known  rndoels.  The  primary  occupancy  of  each  channel  is  modeled  as  a  general 
on-off  renewal  process.  The  distributions  of  the  busy  and  idle  times  and  the  utilization  factors 
of  all  channels  are  unknown  to  the  secondary  user.  The  objective  of  the  secondary  user  is  to 
identify  and  exploit  the  best  channel  (i.e.,  the  channel  with  the  least  primary  traffic)  through 
efficient  online  learning.  We  have  developed  a  dynamic  channel  access  policy  that  achieves 
the  throughput  offered  by  the  best  channel  under  certain  mild  conditions  on  the  busy/idle 
time  distributions.  More  specifically,  the  cost  associated  with  learning  the  unknown  channel 
occupancy  models  over  a  horizon  of  length  T  diminishes  at  the  rate  of  log  T/T.  The  policy 
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is  obtained  by  constructing  a  hypothetical  multi-armed  bandit  with  virtual  reward  which, 
while  not  directly  reflecting  throughput,  preserves  the  ranking  of  the  channels  in  terms  of 
throughput. 

3.3  Multichannel  Estimation  for  Opportunistic  Spectrum  Access 

In  [22,23],  we  address  the  problem  of  estimating  the  parameters  of  the  primary  traffic  in 
multiple  channels  under  a  constraint  on  the  total  sensing  time.  An  accurate  stochastic 
modeling  of  the  primary  system  channel  occupancy  plays  a  crucial  role  in  designing  the 
optimal  algorithms  for  sensing,  tracking,  and  exploiting  spectrum  opportunities.  However, 
such  a  model  may  not  be  known  a  priori  and  must  be  learned  through  sensing  under  a 
constraint  on  the  total  amount  of  sensing  time.  In  [22,23],  the  primary  traffic  in  each  channel 
is  modeled  as  a  continuous  Markov  on-off  process.  The  objective  is  to  learn  the  parameters 
of  each  channel  under  a  constraint  on  the  total  sensing  time  with  the  performance  measured 
by  the  total  mean  square  error  (MSE)  across  all  channels. 

In  [22,23],  We  obtain  the  Fisher  information  matrix  and  the  maximum  likelihood  esti¬ 
mator  in  closed  form.  Given  that  the  optimal  allocation  of  the  total  sensing  time  to  multiple 
channels  depends  on  the  unknown  parameters,  we  propose  a  sequential  estimation  strategy 
which  dynamically  adjusts  the  allocation  of  sensing  time  based  on  the  partial  learning  re¬ 
sults  obtained  up  to  the  current  time.  Specifically,  the  proposed  sequential  estimation  policy 
operates  under  an  epoch  structure.  Within  each  epoch,  channels  are  sensed  in  turn,  each  for 
a  fraction  of  the  epoch  length  with  the  fraction  determined  based  on  the  current  estimate  of 
the  channel  parameters.  The  epoch  length  grows  over  time  to  take  advantage  of  the  increas¬ 
ing  accuracy  of  the  estimates.  We  show  in  [22, 23]  that  the  proposed  sequential  estimator  is 
asymptotically  efficient,  i.e.,  it  achieves  the  Cramer- Rao  Bound  (CRB)  as  the  total  sensing 
time  grows. 
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